-
Notifications
You must be signed in to change notification settings - Fork 3k
AWS: Implement SupportsRecoveryOperations for S3FileIO #10721
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
AWS: Implement SupportsRecoveryOperations for S3FileIO #10721
Conversation
3d59a66 to
6554f38
Compare
| s3.putBucketVersioning( | ||
| PutBucketVersioningRequest.builder() | ||
| .bucket(bucketName) | ||
| .versioningConfiguration( | ||
| VersioningConfiguration.builder().status(BucketVersioningStatus.ENABLED).build()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The simplest way for integration tests seemed to be just to enable bucket versioning for the bucket and then on cleanup, ensure deletion of every object version to make sure no garbage is left over.
rdblue
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me. Thanks, @amogh-jahagirdar!
6554f38 to
ea79984
Compare
| ListObjectVersionsIterable response = | ||
| client() | ||
| .listObjectVersionsPaginator( | ||
| ListObjectVersionsRequest.builder() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: we're using a mixture of patterns (builder/request object). Could we just switch this to:
builder -> builder.bucket(location.bucket()).prefix(location.key())like we do below?
|
|
||
| Optional<ObjectVersion> latestVersion = | ||
| response.versions().stream().max(Comparator.comparing(ObjectVersion::lastModified)); | ||
| return latestVersion.map(version -> recoverObject(version, location.bucket())).orElse(false); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You might want to put a check here for version.isLatest() just to protect against being called on an object this the latest version. It's unlikely, but it could actually recover and older version since there is no direct check here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this might be my poor naming, this latestVersion is the version we want to recover to.
For the "isLatest()" check S3 considers deletion markers .
isLatest will only be true for the deletion marker version in the recovery case.
The version that we want to recover to is not the latest version based on S3's consideration of deletion markers; it's the version just before that which the current check will do based on the last modified of the versions (deletions are not included in response.versions().
I'll call this variable recoveryVersion and add an inline comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can skip the recovery (i.e copy) if recoveryVersion.isLatest() is true as well, this will help the case when we are trying to recover an object which is not deleted yet, so it will not have any deletionMarker
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah that seems like a simple check we can do to avoid unnecessary copies, updated! I didn't go down the route of comparing the latest deletion marker with the version to recover to, because inverting the logic to just verify that the version to recover to is not the latest is simple; and we know that "version to recover" cannot represent a deletion marker due to the guarantees provided by the S3 api.
danielcweeks
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor comments.
c4c8771 to
6e3be14
Compare
6e3be14 to
c1124e1
Compare
singhpk234
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, Thanks @amogh-jahagirdar for taking up this up !
| if (version.isLatest()) { | ||
| return true; | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[optional] we can have an integ test trying to restore an existing object doesn't leads to creating a new version of obj
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missed this, sure I can do a follow on to verify that we skip the recovery if the latest version is not a deletion marker.
This change implements the SupportsRecoveryOperations mixin for S3FileIO. The implementation is a best effort recovery which first determines the latest object version, and if that is found, then performs the actual recovery by performing an s3#copy operation.
This can be used as a primitive in repair procedures that the community is working on or even simply work standalone in someone's own logic if they want to correct their table state by recovering files which may have accidentally removed (assuming their bucket has versioning enabled)